Design of Experiment - Full Factorial

2ˆk Factorial Design - Which factors matter?

lruolin
04-14-2021

Introduction

Design of Experiment (DOE) is something I wished I had the chance to learn in university or at my workplace when I was younger, it would have made me a more effective scientist. I took an “Advance Design of Experiment” course with Singapore Quality Institute (SQI) last year, and had a brief introduction to how DOE works. The instructor was using Minitab for his course, and I want to translate it to something that I can do with R.

Experimental design is important for:

((Miller and Miller 2005))

It is important to define clearly the purpose of the experiment before you can choose the type of experimental design.

DOE approach vs One Variable At a Time Approach

The advantages of using a DOE approach as compared to a OVAT approach are outlined below:(Marini 2013)

The full factorial designs are the simplest possible designs, and as a start, I will practice with it first.

Full Factorial Design (2ˆk)

Full factorial design requires 2ˆk number of experiments, where k refers to the number of variables in the study. As such, it is not practical to look at more than three factors as the number of experiments will exponentially increase. This is suitable for characterization of variables, and is usually done after screening of variables.

Workflow:

  1. Plan the experiments: define the factors, ranges, number of replicates
  2. Perform the experiments in a randomized manner
  3. Analyse the data by modelling and visualization

Load packages

library(pacman)
p_load(tidyverse, AlgDesign, ggfortify, jtools, ggstance, interactions,
       DoE.base, ggthemes, pid)

Case study

Let me use the worked example in https://www.itl.nist.gov/div898/handbook/pri/section3/pri3331.htm

  1. Outcome: Product uniformity
  2. Factors: Pressure (X1), Table Speed (X2), Down force (X3) No. of replicates: 2

Total number of runs: 2^3 x 2 = 16 runs.

Generating the design plan

AlgDesign package

The gen.factorial() function is easy to understand, but does not allow for creating replicates and randomizing the order.

# from AlgDesign package
rep_1 <- gen.factorial(c(2,2,2), # number of levels for the variables
              nVars = 3, # number of variables, in this case it is 3
              varNames = c("X1_pressure", "X2_table_speed", "X3_down_force"))

rep_1
  X1_pressure X2_table_speed X3_down_force
1          -1             -1            -1
2           1             -1            -1
3          -1              1            -1
4           1              1            -1
5          -1             -1             1
6           1             -1             1
7          -1              1             1
8           1              1             1
# to create two sets of replicates
rep_both <- rbind(rep_1, rep_1) 

rep_both
   X1_pressure X2_table_speed X3_down_force
1           -1             -1            -1
2            1             -1            -1
3           -1              1            -1
4            1              1            -1
5           -1             -1             1
6            1             -1             1
7           -1              1             1
8            1              1             1
9           -1             -1            -1
10           1             -1            -1
11          -1              1            -1
12           1              1            -1
13          -1             -1             1
14           1             -1             1
15          -1              1             1
16           1              1             1
# to randomize the order of experiments, use the function: sample()

glimpse(rep_both)
Rows: 16
Columns: 3
$ X1_pressure    <dbl> -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1,…
$ X2_table_speed <dbl> -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1,…
$ X3_down_force  <dbl> -1, -1, -1, -1, 1, 1, 1, 1, -1, -1, -1, -1, 1…
# randomize all rows in dataframe and rename created column as "order"
rep_both$order <- sample(1:16) 

rep_both <- rep_both %>% 
  as_tibble() %>% 
  dplyr::select(order, everything())

glimpse(rep_both)
Rows: 16
Columns: 4
$ order          <int> 6, 16, 13, 4, 1, 2, 3, 10, 7, 14, 8, 5, 15, 1…
$ X1_pressure    <dbl> -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1,…
$ X2_table_speed <dbl> -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1,…
$ X3_down_force  <dbl> -1, -1, -1, -1, 1, 1, 1, 1, -1, -1, -1, -1, 1…

Adding outcome to the dataframe

rep_both$outcome <- c(-3,0,-1,2,-1,2,1,6,
                      -1,-1,0,3,0,1,1,5)

rep_both
# A tibble: 16 x 5
   order X1_pressure X2_table_speed X3_down_force outcome
   <int>       <dbl>          <dbl>         <dbl>   <dbl>
 1     6          -1             -1            -1      -3
 2    16           1             -1            -1       0
 3    13          -1              1            -1      -1
 4     4           1              1            -1       2
 5     1          -1             -1             1      -1
 6     2           1             -1             1       2
 7     3          -1              1             1       1
 8    10           1              1             1       6
 9     7          -1             -1            -1      -1
10    14           1             -1            -1      -1
11     8          -1              1            -1       0
12     5           1              1            -1       3
13    15          -1             -1             1       0
14    12           1             -1             1       1
15     9          -1              1             1       1
16    11           1              1             1       5

Visualization

glimpse(rep_both)
Rows: 16
Columns: 5
$ order          <int> 6, 16, 13, 4, 1, 2, 3, 10, 7, 14, 8, 5, 15, 1…
$ X1_pressure    <dbl> -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1,…
$ X2_table_speed <dbl> -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1,…
$ X3_down_force  <dbl> -1, -1, -1, -1, 1, 1, 1, 1, -1, -1, -1, -1, 1…
$ outcome        <dbl> -3, 0, -1, 2, -1, 2, 1, 6, -1, -1, 0, 3, 0, 1…
# boxplot for all data

rep_both %>% 
  dplyr::select(-order) %>% 
  pivot_longer(cols = c(starts_with("X")),
               names_to = "X_variables",
               values_to = "X_values") %>% 
  ggplot(aes(x = factor(X_values), y = outcome)) +
  geom_boxplot(aes(x = factor(X_values), y = outcome, fill = X_variables)) +
  scale_fill_few() +
  geom_point(col = "darkgrey", alpha = 0.5) +
  facet_grid(~ X_variables) +
  labs(x = "",
       title = "Plot of the main effects showing the outcome for each factor.",
       caption = "Source: http://www.itl.nist.gov/div898/handbook/") +
  theme_few() +
  theme(legend.position = "none")

Modelling - Multiple Linear Regression

glimpse(rep_both)
Rows: 16
Columns: 5
$ order          <int> 6, 16, 13, 4, 1, 2, 3, 10, 7, 14, 8, 5, 15, 1…
$ X1_pressure    <dbl> -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1,…
$ X2_table_speed <dbl> -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1,…
$ X3_down_force  <dbl> -1, -1, -1, -1, 1, 1, 1, 1, -1, -1, -1, -1, 1…
$ outcome        <dbl> -3, 0, -1, 2, -1, 2, 1, 6, -1, -1, 0, 3, 0, 1…
model <- lm(outcome ~ X1_pressure * X2_table_speed * X3_down_force,
            data = rep_both)

# checking diagnostic plots for normality using ggfortify

autoplot(model)

Taken from: http://www.sthda.com/english/articles/39-regression-model-diagnostics/161-linear-regression-assumptions-and-diagnostics-in-r-essentials/

The diagnostic plots show residuals in four different ways:

Interpretation of model results

summary(model)

Call:
lm.default(formula = outcome ~ X1_pressure * X2_table_speed * 
    X3_down_force, data = rep_both)

Residuals:
   Min     1Q Median     3Q    Max 
  -1.0   -0.5    0.0    0.5    1.0 

Coefficients:
                                         Estimate Std. Error t value
(Intercept)                                0.8750     0.1976   4.427
X1_pressure                                1.3750     0.1976   6.957
X2_table_speed                             1.2500     0.1976   6.325
X3_down_force                              1.0000     0.1976   5.060
X1_pressure:X2_table_speed                 0.5000     0.1976   2.530
X1_pressure:X3_down_force                  0.2500     0.1976   1.265
X2_table_speed:X3_down_force               0.1250     0.1976   0.632
X1_pressure:X2_table_speed:X3_down_force   0.1250     0.1976   0.632
                                         Pr(>|t|)    
(Intercept)                              0.002205 ** 
X1_pressure                              0.000118 ***
X2_table_speed                           0.000227 ***
X3_down_force                            0.000977 ***
X1_pressure:X2_table_speed               0.035265 *  
X1_pressure:X3_down_force                0.241504    
X2_table_speed:X3_down_force             0.544737    
X1_pressure:X2_table_speed:X3_down_force 0.544737    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.7906 on 8 degrees of freedom
Multiple R-squared:  0.9388,    Adjusted R-squared:  0.8853 
F-statistic: 17.54 on 7 and 8 DF,  p-value: 0.0002897

Another way to look at model information using jtools package

summ(model)
Observations 16
Dependent variable outcome
Type OLS linear regression
F(7,8) 17.54
0.94
Adj. R² 0.89
Est. S.E. t val. p
(Intercept) 0.88 0.20 4.43 0.00
X1_pressure 1.38 0.20 6.96 0.00
X2_table_speed 1.25 0.20 6.32 0.00
X3_down_force 1.00 0.20 5.06 0.00
X1_pressure:X2_table_speed 0.50 0.20 2.53 0.04
X1_pressure:X3_down_force 0.25 0.20 1.26 0.24
X2_table_speed:X3_down_force 0.12 0.20 0.63 0.54
X1_pressure:X2_table_speed:X3_down_force 0.12 0.20 0.63 0.54
Standard errors: OLS

Key points to take note of:

Main Effect Plot

plot_summs(model)

Another way of looking at model coefficients using the pid package:

pid::paretoPlot(model)

Probing interactions

i_1 <- interact_plot(model,
              pred = X1_pressure,
              modx = X2_table_speed)

i_2 <- interact_plot(model,
              pred = X1_pressure,
              modx = X3_down_force)

i_3 <- interact_plot(model,
              pred = X2_table_speed,
              modx = X3_down_force)

i_1
i_2
i_3

In this case, there is interaction, but not within the selected range for X1_pressure. When there is an increase in X1 and X2, Y will also increase.

There is no interaction between X1 and X3, and X2 and X3.

Concluding remarks

This exercise allowed me to go through the workflow for planning and analysing the results of a simple full factorial DOE. There are many packages in the R ecosystem that allows for DOE design planning, results interpretation and visualization, and it was fun trying to explore the different functions in various packages.

As a next step, I will try to find a more food science related example to work on, ideally one in which there are interactions between factors.

References

https://www.r-bloggers.com/2009/12/design-of-experiments-%E2%80%93-full-factorial-designs/

https://www.itl.nist.gov/div898/handbook/pri/section3/pri3331.htm

https://www.itl.nist.gov/div898/handbook/pri/section4/pri471.htm

Marini, Federico, ed. 2013. Chemometrics in Food Chemistry. First edition. Data Handling in Science and Technology, volume 28. Amsterdam ; Boston: Elsevier.
Miller, James N., and Jane Charlotte Miller. 2005. Statistics and Chemometrics for Analytical Chemistry. Pearson/Prentice Hall.

References

Citation

For attribution, please cite this work as

lruolin (2021, April 14). pRactice corner: Design of Experiment - Full Factorial. Retrieved from https://lruolin.github.io/myBlog/posts/20210414_DOE Full Factorial/

BibTeX citation

@misc{lruolin2021design,
  author = {lruolin, },
  title = {pRactice corner: Design of Experiment - Full Factorial},
  url = {https://lruolin.github.io/myBlog/posts/20210414_DOE Full Factorial/},
  year = {2021}
}